Pesquisa | Portal Regional da BVS

Classification of protein-protein interaction full-text documents using text and citation network features.

Kolchinsky, Artemy; Abi-Haidar, Alaa; Kaur, Jasleen; Hamed, Ahmed Abdeen; Rocha, Luis M.

IEEE/ACM Trans Comput Biol Bioinform ; 7(3): 400-11, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20671313

RESUMO

We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.

Assuntos

Indexação e Redação de Resumos/classificação , Biologia Computacional/métodos , Mineração de Dados/métodos , Mapeamento de Interação de Proteínas/classificação , Algoritmos , Bases de Dados Bibliográficas , Redes Neurais de Computação , Publicações Periódicas como Assunto

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks.

Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana; Radivojac, Predrag; Rechtsteiner, Andreas; Verspoor, Karin; Wang, Zhiping; Rocha, Luis M.

Genome Biol ; 9 Suppl 2: S11, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18834489

RESUMO

BACKGROUND: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. RESULTS: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. CONCLUSION: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.

Assuntos

Indexação e Redação de Resumos , Bases de Dados Bibliográficas , Semântica , Algoritmos , Área Sob a Curva , Modelos Lineares , Ligação Proteica

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA